PR -> feat: Unified Telemetry Layer for Non-LangGraph Trace Pipelines (M2) Body#64
Draft
mjehanzaib999 wants to merge 66 commits intoAgentOpt:experimentalfrom
Draft
PR -> feat: Unified Telemetry Layer for Non-LangGraph Trace Pipelines (M2) Body#64mjehanzaib999 wants to merge 66 commits intoAgentOpt:experimentalfrom
mjehanzaib999 wants to merge 66 commits intoAgentOpt:experimentalfrom
Conversation
…tion do not lose initial node to optimize (TODO: trainer might have a better solution)
…a lot of logs for further analysis
…ns and doc evaluation hooks
- Add T1 technical plan for LangGraph OTEL Instrumentation API - Add architecture & strategy doc (unified OTEL instrumentation design) - Add M0 README with before/after boilerplate reduction comparison - Add feedback analysis and API strategy comparison (Trace-first, dual semconv) - Add prototype_api_validation.py with real LangGraph StateGraph + OpenRouter/StubLLM - Add Jupyter notebook (prototype_api_validation.ipynb) for Colab-ready demo - Add example trace output JSON files (notebook_trace_output, optimization_traces) - Add .env.example for OpenRouter configuration
- Replace hardcoded API key with 3-tier auto-lookup (Colab Secrets → env → .env) - Save all trace outputs to RUN_FOLDER (Google Drive on Colab, local fallback) - Add run_summary.json export with scores and history - Update configuration docs with key setup priority table - Fix Colab badge URL with actual repo/branch path
…ace/io/otel_adapter.py
Deliver Milestone 1 — drop-in OTEL instrumentation and end-to-end optimization for any LangGraph agent via two function calls. New modules (opto/trace/io/): - instrumentation.py: instrument_graph() + InstrumentedGraph wrapper - optimization.py: optimize_graph() loop + EvalResult/EvalFn contracts - telemetry_session.py: TelemetrySession (TracerProvider + flush/export) - bindings.py: Binding dataclass + apply_updates() + make_dict_binding() - otel_semconv.py: emit_reward(), emit_trace(), record_genai_chat() Modified modules: - langgraph_otel_runtime.py: TracingLLM dual semconv (param.* parent + gen_ai.* child spans with trace.temporal_ignore) - __init__.py: export all new M1 public APIs Tests (63 passing, StubLLM-only, CI-safe): - Unit tests for bindings, semconv, session, instrumentation, optimization - E2E integration test (test_e2e_m1_pipeline.py): real LangGraph with StubLLM proving full pipeline instrument → invoke → OTLP → TGJ → optimizer → apply_updates → re-invoke with updated template Notebook + docs: - 01_m1_instrument_and_optimize.ipynb: dual-mode (StubLLM + live OpenRouter), Colab badge, executed outputs, <=3 item dataset, temperature=0, max_tokens=256 budget guard - docs/m1_README.md: architecture, API reference, data flow, semantic conventions, acceptance criteria status - requirements.txt: pinned dependencies for uv/pip environments
A. Live mode error handling:
- A1: TracingLLM raises LLMCallError on HTTP errors/empty content instead of passing error strings as assistant content
- A2: Notebook only prints [OK] when provider call actually succeeds with non-empty content
- A3: gen_ai.provider.name correctly set to "openrouter" (not "openai") when using OpenRouter
- A4: optimize_graph forces score=0 on invocation failure, bypassing eval_fn
B. TelemetrySession API correctness + redaction:
- B5: flush_otlp(clear=False) properly peeks at spans without clearing the exporter
- B6: span_attribute_filter now applied during flush_otlp; supports drop (return {}), redact, and truncate
C. TGJ/ingest correctness and optimizer safety:
- C7: _deduplicate_param_nodes() strips numeric suffixes to collapse duplicate ParameterNodes
- C8: _select_output_node() excludes child LLM spans, selects the true sink (synthesizer)
D. OTEL topology and temporal chaining:
- D9: Root invocation span wraps graph.invoke(), producing a single trace ID per invocation
- D10: Temporal chaining uses trace.temporal_ignore attribute instead of OTEL parent presence
E. optimize_graph semantics + trace-linked reward:
- E11: best_parameters is a real snapshot captured at the best-scoring iteration
- E12: eval.score attached to root invocation span before flush, linking reward to trace
F. Non-saturating scoring for Stub mode:
- F13: StubLLM and eval_fn are structure-aware; stub optimization demonstrates score improvement
Files changed:
- langgraph_otel_runtime.py: LLMCallError, _validate_content, flush_otlp(clear=)
- telemetry_session.py: flush_otlp delegation, _apply_attribute_filter
- otel_adapter.py: root span exclusion, trace.temporal_ignore chaining
- instrumentation.py: _root_invocation_span context manager, root span on invoke/stream
- optimization.py: _deduplicate_param_nodes, _select_output_node, _snapshot_parameters, eval-in-trace
- __init__.py: export LLMCallError
- test_optimization.py: updated for best_parameters field
- 01_m1_instrument_and_optimize.ipynb: all fixes reflected in notebook
- test_client_feedback_fixes.py: 20 new tests covering all 13 issues
… code Make the instrumentation layer fully generic and provider-agnostic: - TracingLLM: default provider_name "openai" → "llm", default llm_span_name "openai.chat.completion" → "llm.chat.completion" - init_otel_runtime: default service_name "trace-langgraph-demo" → "trace-otel-runtime" - DEFAULT_EVAL_METRIC_KEYS: remove example-specific "plan_quality", add generic "score" - instrument_graph: add llm_span_name, input_key, output_key parameters so callers explicitly configure provider/schema specifics - InstrumentedGraph: add input_key field; invoke()/stream() use it instead of hardcoded "query" for the root span hint - optimize_graph: add output_key parameter; _make_state uses graph.input_key instead of hardcoded "query"; error fallback no longer assumes result["answer"] - _select_output_node: replace hardcoded "openai"/"chat.completion" name checks with trace.temporal_ignore attribute from info.otel - otel_adapter: propagate temporal_ignore flag into TGJ info dict - tgj_ingest: preserve info.otel metadata through conversion and onto MessageNode objects Tests and notebook updated to explicitly pass example-specific values (provider_name, llm_span_name, output_key) rather than relying on defaults. All 88 tests pass.
…st iteration Previously, best_updates was overwritten on every iteration where updates were applied, regardless of whether that iteration achieved the best score. This caused best_updates to always contain the last applied updates rather than the updates that produced the best-performing parameters. Introduce last_applied_updates to track the most recently applied updates separately, and snapshot it at the start of each iteration as applied_updates_for_this_iter. best_updates is now only assigned inside the best-score guard (avg_score > best_score), ensuring it accurately reflects the updates that led to best_parameters. Addresses PR feedback item #1: optimize_graph() best_updates tracking.
optimize_graph() previously ignored the graph's configured output_key unless the caller explicitly passed output_key=..., causing incorrect eval payload shape. Now auto-inherits graph.output_key when the parameter is not provided, and logs a debug note when an explicit override disagrees with the graph's configuration. Addresses PR feedback item doxav#2: output_key fallback in optimize_graph.
enable_code_optimization was accepted by instrument_graph() but never used — TracingLLM.emit_code_param always remained None. Now constructs a _emit_code_param callback when the flag is True that emits source code, SHA-256 hash, truncation metadata, and trainable marker as param.__code_* span attributes. Source is capped at 10K chars with truncation flag. Addresses PR feedback item doxav#3: enable_code_optimization no-op.
(4A) otel_adapter: after temporal hierarchy resolution, null out effective_psid when it still references a skipped root invocation span, preventing dangling parent edges in the TGJ graph. (4B) langgraph_otel_runtime: capture child LLM span ref and propagate error/error.type attributes to it on LLMCallError and unexpected exceptions, so OTEL UIs correctly flag the LLM call as failed. Addresses PR feedback item doxav#4.
…race validation Notebook trace validation used "openai" in name to detect child spans, which silently matched nothing after the generic refactoring. Now uses trace.temporal_ignore attribute for provider-agnostic detection and asserts the set is non-empty. Also adds root invocation span assertion to enforce the D9 single-trace-ID invariant. Addresses PR feedback item doxav#6.
…into m1-for-upstream
…into m1-for-upstream # Please enter a commit message to explain why this merge is necessary, # especially if it merges an updated upstream into a topic branch. # # Lines starting with '#' will be ignored, and an empty message aborts # the commit.
…e spans
Library (langgraph_otel_runtime.py):
- Restructure child LLM span error handling: catch errors inside the
child span context manager so attributes are set before the span ends
- Add error.message attribute (truncated to 500 chars) on both parent
and child spans for LLMCallError and unexpected exceptions
Notebook (01_m1_instrument_and_optimize.ipynb):
- Rewrite graph to 6-node architecture aligned with reference demo:
planner → executor → web_researcher/wikidata_researcher → synthesizer → evaluator
- Use Command routing from langgraph.types for dynamic node dispatch
- Switch to DEMO_QUERIES (French Revolution / Tesla / CRISPR)
- Add 3 trainable templates (planner, executor, synthesizer) with output_key=final_answer
- Rewrite StubLLM to produce JSON plans, routing JSON, and topic-aware
answers; respond to prompt template changes for non-saturating scoring
- Rewrite stub_eval_fn: base 0.2 + plan richness + answer length, cap 0.95
- Fix live section: provider_name="openrouter", trace invariant checks,
only print [OK] on actual success
- Fix ParameterNode deduplication in TGJ inspection (id-based dedup)
- Update Colab Drive paths to OpenTrace_runs/M1/{OPENTRACE_REF}
- Add optimization table output (iteration → avg_score → best_score)
Verified: 41 tests pass, notebook runs end-to-end, baseline=0.75 → best=0.95
apply_updates() now normalizes ParameterNode object keys to strings via _normalize_key(), so OptoPrimeV2 updates are no longer silently skipped. ingest_tgj() gains a param_cache to reuse stable ParameterNode instances across multi-query iterations. The backward pass now iterates all output nodes, and stale OTLP spans are flushed at the start of optimize_graph(). - bindings.py: accept Dict[Any, Any], return applied dict - tgj_ingest.py: add param_cache kwarg for ParameterNode reuse - optimization.py: flush stale spans, use param_cache, fix backward loop, use applied dict from apply_updates() - notebook: enable INFO logging in live optimization cell
The GraphPropagator asserts that user_feedback is identical when aggregating across multiple backward passes. Running zero_feedback → backward → step per query (matching the BBEH notebook pattern) avoids this and lets each query contribute updates independently.
…optimizer steps
Replace the per-query backward/step loop with Trace's canonical minibatch
pattern: batchify all output nodes into a single batched target and all
per-query feedback into a single batched feedback string, then call
backward() and step() once. This avoids the GraphPropagator assertion
("user feedback should be the same for all children") while ensuring all
queries' graph paths contribute to the optimization gradient.
The batchify import is lazy-loaded via _ensure_trace_imports() to avoid
pulling in numpy and the trainer package at module level.
Implement TelemetrySession activation via contextvars so @trace.bundle ops and MessageNode creation can emit OTEL spans outside LangGraph. - Add BundleSpanConfig and MessageNodeTelemetryConfig to control span emission and node-to-span binding (message.id) - Add bundle_span() context manager and on_message_node_created() hook in TelemetrySession for non-LangGraph OTEL visibility - Wrap sync_forward/async_forward in optional OTEL span when session active - Emit temporal-ignore child span in call_llm for provider monitoring - Activate session inside InstrumentedGraph root span so Trace hooks discover it automatically - Add opto.features.mlflow with autolog/disable_autolog (safe no-op when MLflow not installed) - Add opto.trace.settings for global MLflow toggle - Align export naming to otlp.json/tgj.json with legacy aliases - Add manifest.json and message_nodes.jsonl to export bundle
Covers all M2 features: TelemetrySession activation, bundle span emission, default-op silencing, MessageNode binding, call_llm temporal-ignore spans, export bundle naming, MLflow autolog API, M1 non-breaking compatibility, and end-to-end non-LangGraph pipeline. Includes live OpenRouter sections (auto-skipped if no API key).
…ooks and remove stale files Move 02_m2_unified_telemetry.ipynb into examples/notebooks/ for consistency with the M1 notebook location. Remove leftover files from the repo root: M1 notebook copy, OVERVIEW.md, and PR diff files." git push origin m2-unified-telemetry
…ng works postprocess_output (which creates the MessageNode) was called after the bundle span had closed, so on_message_node_created could never find an active span to attach message.id to. Move it inside the span_cm block for both sync_forward and async_forward.
The install cell only cloned on first run but never pulled updates when the repo folder already existed, causing stale code to persist across runtime restarts. Added git fetch + pull to guarantee the
- Updated M2 notebook install cell to add repo root to sys.path when running locally, eliminating the need for pip install - Added git fetch + pull to Colab install cell so restarts pick up latest commits instead of using stale cloned code - Removed debug probe from MessageNode binding cell - Relaxed setup.py python_requires from >=3.13 to >=3.12
Added Sections 8a-8c that install MLflow and validate real integration paths: autolog enabling, bundle wrapping via mlflow.trace(), artifact logging via TelemetrySession, and log_metric/log_param recording.
… compatibility mlflow.trace() wrapping accesses fn.__name__ on the decorated callable. FunModule (the object returned by @Bundle) did not expose this attribute, causing an AttributeError when executing bundle-decorated functions inside an active MLflow run. Forward the original function's __name__ and __qualname__ onto the FunModule instance.
…flow.trace() - Add Section 8d to M2 notebook: launches MLflow UI inline on Colab (port 5000) for visual inspection of experiments, runs, artifacts, and metrics. Falls back to terminal instructions when running locally. - Expose __name__ and __qualname__ on FunModule so mlflow.trace() can resolve the function name without AttributeError. - Update notebook summary tables (header + footer) to include Section 8d.
…n 8d) Renders an embedded iframe and a direct "Open in new tab" link using Colab's proxyPort API so users can visually inspect MLflow experiments, runs, artifacts, and metrics logged by the preceding test cells. Also exposes __name__/__qualname__ on FunModule to fix AttributeError when mlflow.trace() wraps @bundle-decorated functions.
Replace proxyPort-based link (blocked by Colab pop-up blocker) with subprocess.Popen + serve_kernel_port_as_iframe for reliable inline rendering of the MLflow UI in notebook output.
The call was passing unsupported kwargs (operation, output_messages, response, temperature, max_tokens) which silently raised TypeError under the bare except, leaving gen_ai.input.messages and gen_ai.output.messages unset. Use the actual signature parameters (provider, model, input_messages, output_text) so the semconv attributes are recorded on the LLM child span.
Replace single _token with _token_stack list so that nesting with session: on the same TelemetrySession instance correctly restores the context variable on each exit instead of leaking the active session.
Allow activating a TelemetrySession without indenting all pipeline code under a with-block. Useful in notebooks and long scripts. Both methods share the _token_stack so they compose safely with context-manager activation and nested calls.
… cells Add Section 8e validating MessageNodeTelemetryConfig(mode="span") which creates dedicated spans when no active span exists. Add Section 8f validating the full OTLP -> TGJ -> ingest_tgj() round-trip that underpins the optimization data path. Update header and summary tables accordingly.
…_signature__ MLflow's capture_function_input_args uses inspect.signature(func) to bind args. FunModule inherited Module.__call__(self, *args, **kwargs), so inspect returned the wrong signature and bind failed or produced bad data. Set __signature__ = inspect.signature(fun) so MLflow sees the real parameter names (x, y) and can capture inputs correctly. Remove the previous warning suppression and note from the notebook.
Remove redundant sp.set_attribute("gen_ai.operation.name", "chat") from
call_llm() in operators.py — record_genai_chat() already sets this to
"chat.completion" per OTEL GenAI semantic conventions. Update notebook
assertion to expect "chat.completion" accordingly.
Strip tqdm progress-bar widget-view outputs from cell 27 (MLflow artifact download) — GitHub's nbformat renderer rejects notebooks with metadata.widgets entries missing the required 'state' key.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
This PR implements the "Generic Unified Telemetry" layer (Milestone 2), enabling OTEL span emission for non-LangGraph Trace pipelines while preserving all existing LangGraph instrumentation behavior.
After M1, only LangGraph pipelines could emit OTEL spans. This PR extends telemetry coverage so that any Trace pipeline using
@trace.bundleorcall_llmcan produce OTEL-compatible spans when aTelemetrySessionis active — with zero changes to existing code when no session is active.What's new
contextvars—TelemetrySessionsupportswithcontext manager andactivate()for global discovery by Trace hooks@trace.bundleops — controlled byBundleSpanConfig(enable/disable, suppress default ops, capture inputs)MessageNodeTelemetryConfigbindsmessage.idto current span for stable node identity in TGJ conversioncall_llmprovider span — emits a child OTEL span withtrace.temporal_ignore=truewhen a session is active (visible for monitoring, excluded from output node selection)InstrumentedGraph._root_invocation_spannow callssession.activate()so Trace-level hooks discover the session automaticallyopto.features.mlflow.autolog()enablesmlflow.tracewrapping on bundle ops; safe no-op when MLflow is not installedexport_run_bundle()now writesotlp.json/tgj.json(aligned with repo demos) with backward-compatible aliases (otlp_trace.json/trace_graph.json)manifest.jsonandmessage_nodes.jsonlincluded in export bundle for debuggingFiles changed (9 files, +664 / -71)
opto/trace/settings.pyopto/features/mlflow/__init__.pyopto/features/mlflow/autolog.pyautolog()/disable_autolog()opto/trace/__init__.pysettingsandmlflowin public APIopto/trace/bundle.pysync_forward/async_forward; MLflowmlflow.tracewrappingopto/trace/io/telemetry_session.pyBundleSpanConfig,MessageNodeTelemetryConfig, span helpers, MLflow helpers, export alignmentopto/trace/io/instrumentation.pysession.activate()opto/trace/nodes.pyMessageNode.__init__to callon_message_node_created()opto/trace/operators.pycall_llmemits temporal-ignore provider spanNon-breaking guarantees
TelemetrySession.current() is Nonecheckspostprocess_outputsignature unchanged — preserves compatibility with existing callerspreprocess_inputspreserved — data extraction insidetrace_nodescontext is untouchedTest plan
opto.traceimport works without errorsTelemetrySession+BundleSpanConfig+MessageNodeTelemetryConfigimport correctlytrace.bundle.*andinputs.*attributesTelemetrySession.current()returnsNoneoutside context, active session insideexport_run_bundle()producesotlp.json,tgj.json,manifest.json+ legacy aliasesautolog(silent=True)gracefully disables when MLflow is not installedgeneric_unified_telemetry_demo.ipynb)